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ABSTRACT 


The purpose of this study is to review the problem of 
multicollinearity in regression analysis. Specifically, the 
difficulties that arise when multicollinearity is present, the 
alternative procedures available for detecting the problem, and the 
methods by which it may be resolved, are discussed. It is discovered 
that an error exists in Kirchdorfer's method of detection, thus 


rendering the procedure invalid. 


In considering the strengths and weaknesses of each method 
of detection or resolution, what crystallizes is the view that to date, 
no highly satisfactory means of treating the problem has yet appeared. 
An illustrative application of the theory is obtained using Farrar- 
Glauber's techniques to detect multicollinearity in a sample of economic 
data. Hoerl and Kennard's Ridge Trace is then constructed, followed 
by calculation of rene shrunken estimator to remedy the 


detected multicollinearity. 
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CHAPTER I 


INTRODUCTION 


One of the most vexing problems in multiple regression 
analysis is that of multicollinearity, a term used to denote the 
presence of near linear relationships among the "independent" variables. 
Although econometricians and others seldom face the situation in which 
there is perfect multicollinearity, that is, one or more variables are 
exact linear combinations of other variable(s), high intercorrelation 
is nevertheless often an inevitable occurrence. This is due to the 
fact that economic variables are not generated by experimentally con- 
trolled conditions. It would therefore be of considerable value to 
investigate the problem of multicollinearity and the difficulties 
associated with a multicollinear set of data. Such was the intent of 


this study. 


1.1 The Regression Model 
The model on which our discussion centres is the familiar 
linear multiple regression equation 
ve= X6 au (1) 


where X is an n Xm matrix of n observations on m "independent" 
variables, rank K=r<m<n, £8 is an mx 1 vector of unknown 


parameters and u is a vector of disturbances. 
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The minimal assumptions underlying the least squares theory 


are as follows: 


the elements of u are independently distributed random 


: F i 2 
variables with mean zero and constant variance o . 


According to the theory of least squares, we minimize 


(y-X8)'(y-X8) and obtain the normal equations: 


x XBr= X"y (2) 


Two cases can be considered depending on the singularity or 


Honstueu larity sotenk oc, 


Git eek Slot iil ierank., (x'xy t exists and the least squares 


estimator is given by 


B = (x'xy txty : 


a 


The estimates BS are unbiased, efficient and consistent as stated in 


the Gauss Markov theorem, a simplified proof of which is presented below. 


The Gauss Markov Theorem. In the classical linear regression model, 


the best linear unbiased estimator of 8 is the least squares vector 


goB (xix) eaxeys 


Proof (Plackett [44]). Let Wy be any unbiased estimator of 8 , 
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Thus we can write 
(x'xy 7} = wxcxtxy 
and obtain the identity 
WHT = (RX) OR TLCR RY TRY + CHECK RE (RIX) XY 


That is, the diagonal elements of WW' are least when W = (x'x) ty! 


which is the solution provided by least squares. 


Adding to (1) the assumption that u is normally distributed, 
the results for the classical least squares model carry over. 8 has 
the same mean and variance as before. (rem vane ce is distributed as 
oe . In addition, 8 = (x'xy txty is now normally distributed 
since it is a linear form in a normally distributed vector. It is also 


a uniformly minimum variance unbiased estimator of 8B. 


Since the likelihood function of the sample is 


- —+—(y-x8)' (y-X8) 
ee 1 a (2c) 
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maximizing it with respect to 8 is equivalent to choosing 8 such 
that (y-X8)'(y-X8) is minimized. As this is precisely the least 
squares criterion established earlier, the maximum likelihood estimator 
is simply the least squares estimator. 8 is thus consistent and 
asymptotically efficient. The maximum likelihood estimator of o can 


be obtained as (u'u)/n . 
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(II) Tferanky X= ric = » no unbiased estimator of 8 exists. How- 
ever, aclase of Jinear functions L'8 , where L is an m x 1 vector, 
may have unbiased estimators. These are the so-called "estimable 
functions". The estimable functionals L' are characterized by the 
property that a solution to the equations a'X = L" exists, i.e. they 


are vectors in the row space of X . We have the following result. 


D> 


Theoremsl...l. s[Theabest linear unbiased estimator of. L'B is 1! 


where 8 is any solution to the normal equations (2). 


L'8 may be expressed as L' (X'xX) &x'y where (x'x)® is any 
generalized inverse of (X'X) . A generalized inverse of an n * m 
Matrix eA OLeany tanks an) imine matrix a& such that for any 
vector Y for which AX = Y is a consistent equation, X = ASy is a 
solution. Penrose [43] shows that for any matrix A , there exists a 


S 
unique matrix A P satisfying the four conditions: 


g 
(i) AA PA=A 


ae sees’? — 4°? 
(iii) can Py! ae (t) 
(iv) (a Pay? = APA. 
g 


AP? is referred to as the Moore-Penrose inverse. However, a solution 
to AX = Y , where A is a singular square or rectangular matrix, 


requires a generalized inverse which satisfies only condition (CL) <a coe 
g 
a matrix is called a 8 ,~inverse of A and denoted by A t . Likewise, 
oH) ao 
the Bo and g,—inverse of A , denoted by A and A , are defined 
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respectively by the first two and the first three conditions of (tT). 


Before proceeding with the proof of Theorem 1.1.1, we state 


first the following lemma: 


Lemma 1.1.1. A matrix G isa g,-inverse of xX iff it can be written 


21 
ase Gann (Rek)) ek 


Proof of Theorem 1.1.1. (Chipman [9]). Let 8 = My +d. For L'B 


to be an unbiased estimator of L'B , we require 


E(L'8) = E(L'My + L'd) 
= L'MXB + L'd 
=L'g. (3) 


Condition (3) is satisfied iff 
Te Mkt a ieee ee de 2 


OF 


a'XMX = a'X « 


Thus for M=X~°, L'My is an unbiased estimator of an estimable 


function L'8& . Now 
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We wish to find an A which minimizes (4) subject to XAX=X. 
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Thus XAA'X' is minimal when A is a g,—inverse Oc ke  oince 


g g 
Cy ee 


(Lemma 1.1.1) and any generalized inverse is a 


g,—~inverse, the proof is completed. 


The two cases which have been discussed above may be treated 


within a unified framework. If (X'X) is square and of full rank, 


(x'x)® = (x'xy 1 . Moreover, as Rao [48] has pointed out, (x'x) ®x'y 


and 07 (x'x)8 may be regarded as 


"estimates of 8 and the dispersion 


matrix of estimates respectively, for purposes of building up an 


estimate of any estimable function L'§8 and determining its variance". 


1.2 The Multicollinearity Problem 


Linear relationships among the independent variables may 


exist in various forms, either between pairs of independent variables 


or in a more complicated manner involving several members of the 


independent set. In 


general, such intercorrelation results in: 


(1) inaccurate estimation of parameters due to large sampling variances 


GLe the coctiicients. 


As the columns of X become increasingly collinear, 
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the matrix (X'X) approaches singularity, resulting in the inverse 
matrix having some very large diagonal elements. In the limiting case, 
the determinant of (X'X) is zero and its inverse would not exist, 


leading to a completely indeterminate set of parameter estimates. 


(2) uncertain specification of the model with respect to inclusion of 
variables and a danger that relevant variables will be discarded incor- 
rectly. For example, if the i-th diagonal element is large, Xx, may 
appear to be statistically insignificant even if it is important in the 


true relation. 


(3) estimates of coefficients become very sensitive to slight changes 


in the data sample. 


As a simple demonstration of the third difficulty, we consider 
the data in Table I concerning the imports, production, stock formation 
and consumption obtained from the French national accounts. These data 
reveal an approximate multicollinearity between production and consumption, 
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namely X =A 4] : 
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Using least squares computer program MLREGR [42], we obtain, 
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TABLE I 


Imports, Production, Stock Formation and 
Consumption in France (in Milliards of 


New Francs at 1956 Prices) 


y, x, xX, X, 


Gross 


: Stock 7 
Imports Domestic £0¢ Consumption 


r Formatio 
Production be 


1949 ee] 149.3 4.2 108.1 
1950 16.4 1612 4.1 114.8 
ile 19.0 iby hs: Soll 123.2 
EO 2 19.1 175.5 ee Al 2609 
L953 18.8 180.8 Te she yah 
1954 20.4 190-7 ds, fs be WP) 
1955 Zoe 202s rasa 146.0 
1956 2625 212.4 5.6 15a 
oe 28.1 Paya ed | Syl) EOres 
1958 PMS UBM bo) eee 164.3 
1959 26.3 Pde \s 4) 0.7 ey 32) 
1960 Sie 258.0 526 176.8 
1961 Sees 269.8 BAY, 186.6 
1962 37.0 288.4 SIGAL 199.7 
1963 43.3 304.5 4.6 Pan eye 
1964 49.0 B25 .4 7.0 VAPOR IS &) 
L965 50.3 336.8 12 232.0 
1966 56.6 Shei 4.5 242.9 


Source: E. Malinvaud, Statistical Methods of Econometrics 


(North-Holland: 1971). 
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and the estimated relation 


SES cosle a? ats W 9 Be 0.06788x, Fe 0.58914x, + 0.34725X., 


(-0. 72478) (2590977) (2.44402) 


t values are given in parenthesis. The squared multiple correlation 


coefficient R2 = 0.9847. 


Suppose now the original model is re-estimated from data for 
18 years, namely 1949 to 1966, instead of the original 15 years 1949 to 
1963. A different set of parameter estimates is obtained and the 


estimated relation becomes 


i) Sh RS STORES ae 0.03210xX, ap 0.41421X, aa 0.24293X, 


COL 7193) (1.28690) (0285253) 


where the number in parenthesis refer again to t values . Here 


R? = 0.9731 and the sample correlations matrix is obtained as 


1.00000 0.21545 0.99893 


C 


0221545100000 50521369 


0.99893 0.21369 1.00000 


Table II gives the ratio of the original to revised estimates for the 


two parameters. 


It can be seen that both B. and 8, vary by more than 402, 
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while the coefficient By of xX) in the revised model turns out to be 


positive. With the addition of three further years' data, the coeffi-~ 


cients By and 6, » formerly significant at the 5% level 


(t9 975,11 = 2.201) are now insignificant (to 975,14 = 2.145) . Thus 
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extension of the sample period has produced dramatic changes in the 
estimated relationship. 
TABLE II 


Ratio of Original to Revised Estimates 


Original Estimate 
( Revised Fa einaees x 100 


142 


143 
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Difficulties at the computational level also arise in situations 


where multicollinearity is very severe, that is, when the determinant 
of (X'X) is very close to but not zero. The observations of Klein 
and Nakamura [31] in this regard include the fact that while the 
elements of the inverse matrix (x'xy + in the two variable case can 
still be calculated with sufficient accuracy by carrying enough digits 
at each computational stage, accurate estimation is considerably more 
difficult to attain when the matrix size is 5 by 5 or larger. In- 
deed, given the sort of intercorrelation frequently existing in 
econometric data, they note the virtual impossibility of calculating 
the inverses of 30 by 30 matrices, even if the most powerful 


electronic computer is available. 
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CHAPTER II 


DETECTION OF MULTICOLLINEARITY 


Two practical issues arise in connection with the problem of 
multicollinearity and its treatment. Firstly, how is its existence to 
be detected and its severity established? Secondly, how serious must 
multicollinearity be before it can be considered "harmful"? In 
attempting to answer these questions, several research workers have 
suggested various measures of multicollinearity and possible means of 


detection. The efforts of these workers are discussed below. 


2.1 Measures of Multicollinearity 


A common measure of the degree of multicollinearity is the 
value of the determinant of X'X , where X is X normalized so that 
each column has zero mean and unit variance. X'X is accordingly the 


sample correlations matrix C . The value of |X'X| ranges from 0 


to 1 as multicollinearity becomes less severe. 


Another measure of multicollinearity is the von-Neumann and 


Goldstine [58] condition number 
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where i.: the largest of the eigenvalues of X'X 


X : the smallest of the eigenvalues of X'X . 
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If the columns of X are orthogonal, A will be 1. How- 
ever, as they become collinear, on will become very small, so that A 
becomes quite large. As i increases, the probability of significant 


error of estimates also increases. 


Between these two measures, A is to be preferred owing to 
its more direct relationship with the effect of multicollinearity. Not- 
withstanding this, disadvantages exist in that both measures do not give 
information about the pattern of interdependence, and that in neither 
of them can absolute conceptions of bigness or smallness be fixed. 
Nevertheless, in the case of i , some guidelines may be obtained from 


the fact that for a correlation matrix 


X} has a minimum value of 1 and is ~m with probability asymptotically 


Hoe 


Multicollinearity can be localized by calculation of the 


determinant of (X'X) and those of Ca oes, » which are matrices 


~ 


obtained by omitting each of the independent variables in turn. Lt Xx, 


is orthogonal to the other members of X , then 


| xx) | = [x*3| 
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would have the value 1. This can be shown to be its minimum value in 


2 mute Z 
the following way. Defining Ry as the squared multiple correlation 
af 


between Xx, and the other members of X , we have 


oii : 1 
ka 2 
pkve® 
“y 
and 
2 
0 < < 1 e 
“x; 
Thus 
| (x X) il a |x X| bs 
When Xx. is perfectly dependent on the remaining members of 
X, [x'x| vanishes while |X), | » Since it does not contain x ; 
remains unaffected. In this case, are =o , If perfect linear 


alot. x 2 k : 
= 0/0 which is indeterminate. 


dependency exists in (X'X) 5 5. then) =c 

However, one would not have proceeded to localize multicollinearity if 

the determinant is found to be exactly O in the first place. Therefore, 
° . ogee -l1 wih ; 

thessize-or;the diagonal elements of (X°X)) , 1<c <® , isa 


good indicator of the location of the problem. 


To decide whether multicollinearity is harmful, a number of 
rules-of-thumb have been proposed. Farrar and Glauber [15] suggest the 


rule 
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In other words, the overall R should exceed the highest Ro of any 
regression of one independent variable on its couterparts. In a recent 
review, Raduchel [46] has explicated this rule in terms of the generalized 
variance of the coefficients, that is, the determinant of their variance- 
covariance matrix. Defining Pp, as the ratio of generalized variance 

of the coefficients of a regression including x, to the generalized 
variance of the coefficients of a regression excluding it, and applying 


standard theorems on correlation coefficients, he obtains 


(1-r4)™ T(-R’) 


(-R;, ) 
ae 


where rs is the partial correlation of y and X; » given the 


influence of the remaining independent variables. 


Farrar-Glauber's rule of thumb therefore guarantees that all 
Ps will be less than 1. As a modification of this rule, Haitovsky 
[22] has suggested that comparison should be made instead between the 
partial correlation coefficients of all pairs of the independent 
variables and the overall R? . His views will be published in a 


forthcoming paper [23]. 


Turning now to the subject of detection, a comprehensive 
search of the literature revealed as many as four methods have been 
proposed since 1934. The earliest attempt to deal with the problem 


goes back to Frisch [18]. 
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2.2 Frisch Bunch Map Analysis 


Essentially the basic idea of this technique is the deter- 
mination of the regression plane by minimizing the residual sums of 
squares in various directions, for each of all possible subsets as well 
as the full set of variables including the dependent variable y 


(usually denoted by X,)- 


The construction of bunch maps for any subset Xp oXy vere Xs 
2 <k < ml, or the full set involves only two variables in each map. 
For example, a subset consisting of three variables will have 3° 
bunch maps. Each bunch map consists of k beams, the k-th beam having 
slope - Rg Rea » where i < j and Ry denotes the cofactor of Sy 
in the correlation matrix C. 

Using standardized variables, it can easily be shown that - 


Ra /RL is simply the ratio of the coefficients of X,. and xX, in the 
J 1 j ah 


regression of x on x Ki oeees XL 1° 


Following construction of the bunch maps, each bunch is 
compared with the corresponding bunch in the first subsets of the set 
-considered, comparison being in terms of the dispersion of the beams 
or their lengths. When the inclusion of a variable renders the new 
bunch more widely spread, the variable is deduced to be correlated with 
the other variables in the bunch. Conversely, if the variable added 
possess a very short beam relative to the other beams, it is orthogonal 
to the other variables. A theoretical explanation of these deductions 


has been given by Malinvaud [36]. 
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An elucidation of the use of Frisch's Bunch Map analysis is 
provided by considering the bunch maps (Figure 1) which have been 
constructed from the data in Table III. Focussing first on the bunch (24), 
we see that the bunch remains more or less unchanged upon the inclusion 
of variable No. 3. Im addition, the beam corresponding to variable 
No. 3 is extremely short. We therefore conclude that variable No. 3 
is approximately orthogonal to the other variables. Multicollinearity 
on the other hand is exemplified by the behaviour of the bunch (123) 
when variable No. 4 is added to it. Since the bunch becomes less tight, 
we deduce that variable No. 4 is correlated with the other variables 


in the set. 


To a certain extent then, Frisch's technique involves sub- 
jectivity in interpretation of the bunch maps. This lack of precision, 
as well as the laborious calculation required for all the cofactors 
have rendered the technique obsolete. Another early technique that has 


been developed but which fared no better, was that by Tintner [54]. 


2.3 Tintner Method 


Tintner adopts Frisch's view that the variables are composed 
of two parts, 


» | i= La ayers 9 1 


CoG peer. 4th 


where Mi. is the systematic or "true" part and ee is the random 
af 


or "error" component which arises as error of measurement. The vee 


are supposed to be normally distributed with mean zero. 
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Bunch Maps for Variables in the French Economy Data 
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We assume that the variance-covariance matrix [c,;] of the 
errors is known or that it can be estimated, for example, by the 
Variate Difference Method. Let IV; 5] denote the estimate of [c,5] % 
Then, to estimate the number of independent linear relationships among 
the Mi in the hypothetically infinite population which corresponds 
to our sample, Tintner suggests that the following determinantal 


equation be formed 
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where a.. is the covariance of X, and x, and ag the variance 
1j 


We form the test function 


A = (n-1) A, +A5+.. FAL) 


where Ay is the smallest root of (5), ho is the next smallest and 
: Z : : 
so on. According to Hsu [27], \. is asymptotically x distributed 


with r(n-m-lt+tr) degrees of freedom. In addition, Anderson [1] has 


shown that 


has an asymptotic normal distribution with mean 0 and variance 1. 
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Therefore, if A gil 


l Seis ii arejnotesivnificant, bute.A 


2 q qt+1 


is, we can conclude that there are q independent linear relationships 


among the Mey ‘ 


Tintner published his technique in 1952 but since then it has 
seen little use due to a number of inherent weaknesses. These include 
the facts that it is valid only for large samples and that its 
appropriateness is conditional upon the existence of error of observations. 
Moreover, it relies on the assumption that the variance-covariance matrix 


of the errors can be known or estimated. 


2.4 Farrar-Glauber Technique 


In recent years, more satisfactory methods have emerged, one 
major contribution being that of Farrar and Glauber [15]. Viewing 
multicollinearity as a feature of the sample rather than that of the 
population, they have defined multicollinearity in terms of departures 
from orthogonality. Under the assumption that the sample is taken 
from an orthogonal, multivariate normal population, they then propose 
a three level test for the "presence, location and severity of multi- 
collinearity". At the primary level of detection, the determinant of 


(X'X) is transformed into 


rs (7) =o— [ne ee +(2m+5) J1og [x"X| 


x" 
which was shown by Bartlett [2] to have an approximate Chi square dis- 


tribution with v = m(m-1)/2 degrees of freedom under the null hypothesis 


that the columns of X are orthogonal (AXtS = 1). Next, to determine 
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which particular variable is affected by multicollinearity, diagonal 


225 


~~ om =] 
elements of (X'X) are transformed in such a way as to enable the use 


of F test. By applying the results obtained by Wilks [60] on the 


distribution of the ratio of the determinant of correlation matrix to 


h of its principal minors which are mutually exclusive, and making the 


transformation 


it Dae Ty 
w= (e*-1) (4) (6) 
2 


where Vy =e 5 Vv, =m - 1. Farrar and Glauber derive the density 


function of w as an F distribution with vy and Vo degrees of 


freedom. 


ti 1 ; 
tavemin > (6) can be written as 


Since ce 


is 
+ EHR 


ch 


Finally, a notion of the pattern of interdependence can be 
obtained by examining the partial correlation coefficients of the 
variables. Farrar and Glauber show that normalized off - diagonal 
elements of (XX) + yield the partial correlation coefficients among 


the independent variables, namely, for any pair X.5 
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The t test is used as a criterion since the statistic 


Cer. Y Ll=-m 
pe) = 2 


has a t distribution with v=n-m degrees of freedom. 


Nevertheless, as Haitovsky has pointed out, since economic 
data are hardly ever orthogonal, this test is of little meaning to him. 


He suggests an alternative method using the statistic 
2 1 wes 
Xy Ov) =-[{-n -1- G62mt5) J log (1-|x xt) 3 


In this context, the null hypothesis becomes [x'x| =i Q@oethateis. the 
data are perfectly collinear. The value of Xe would be small when 
multicollinearity is high since [x'x| would approach zero. The 
severity of multicollinearity can be measured by the level of signi- 
ficance at which the null hypothesis is accepted. A fe value, for 


example, significant at the 0.9 level, would indicate a high degree of 


multicollinearity. 


In a recent review, Raduchel expresses agreement with 
Haitovsky's comment on the test of Farrar and Glauber. He criticizes 
though the usefulness of Haitovsky's proposal of a heuristically 
motivated test of the converse hypothesis. The situation of perfect 
collinearity is just as unlikely as the other extreme in practice, or 


when it does, there would be little meaning in applying regression 


anyway. 
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2.5 Kirchdorfer Procedure 


The latest development in the problem of detection comes from 
the German statistician Kirchdorfer [30]. His method is based on the 
Gram-Schmidt orthogonalization process. Suppose X is factored into 


an orthonormal matrix D and an upper triangular matrix U. 


X= DU. (7) 


The elements of D and U are determined by a process 
involving an intermediate matrix E. Let Xs denote the elements of 
A ewich Xi9 ° the elements of the variable Xo peeCita COM Lem orund LL 
iceeio@ ts... Loe elements of D, ~U', and E , denoted by eagle 
ae » and oA respectively, are obtained in the following way: 


Beginning with k=0, let 
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The above procedure is repeated from (9) to (12) until 


By simple manipulation of the original model, Kirchdorfer 


derives the result 


Since 


(x'xy 1 (u'p'pu) 2 


(u'u) (by orthonormality of D) 


mie (13) 


=U 


the process of detecting multicollinearity is much simplified by con- 
sidering the matrix U instead. An examination of the size of the 
diagonal elements of U would indicate the source of multicollinearity. 
In the general case, if uss is small, Kirchdorfer concludes that X. 


is correlated with the remaining independent variables. 


Kirchdorfer's procedure is a recently invented one and since 
its publication in 1971, no reports have appeared in the literature 
concerning either its theory or practice. It therefore seemed of 
interest to apply the technique to our data on the French economy 
(Table I). Using Program A (Appendix II), the resultant matrix U is 


obtained as follows: 
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Examining the Tice elements, it is seen that none of 
them have small value. More importantly, variable No. 4 is not 
identifiable on the basis of Kirchdorfer's procedure as the source of 
multicollinearity. Yet as we recall, the calculations of Chapter I and 
the Bunch Map analysis performed earlier in this chapter, point to the 
existence of multicollinearity between the variables No. 2 and 4. Thus 
our particular set of data constitutes a counterexample to Kirchdorfer's 


procedure. 


How may this phenomenon be explained? Consider again 


equation (13) 


Veg endl 
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where uJ is the ij-th element in the matrix U : 
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We know that when X, is multicollinear with other members 
of the independent set, mit will be large. From our set of equations 
ine(14)> it. is) clear that Us; will likely be very small when ee is 
large only in the case i=m. For all i = 1,2,...,m-1 , the presence 
of additional squared terms in the set of equations (14) may result in 


: F A Lie = 
1/u,; not being necessarily large even if x is large. 


In sum, Kirchdorfer's method succeeds with certainty only for 
the case when the m-th variable is affected by multicollinearity. The 
diagonal elements of U in all other instances of multicollinearity 
may or may not be small, so that no certain indication is obtained as to 


whether or not the problem exists. 


Kirchdorfer's own numerical example illustrates the above 
argument well. It is coincidental that only the third variable in his 
set of Xx. "s is affected by multicollinearity, as evidenced by the 


significantly small value of u If multicollinearity had been 


ome 
located in other variables, and not in the last variable x, » then 


Uy may not be necessarily small. 
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CHAPTER III 


RESOLVING MULTICOLLINEARITY 


The discussion thus far has delineated multicollinearity in 
terms of its detection by a variety of methods. Given that the problem 
indeed exists in a particular situation, the next logical task to face 
is the search for meaningful remedy. A number of alternative means of 
resolution have been proposed, some for cases where X is less than 
full rank, and others applying to cases where multicollinearity is only 
approximate. The latter category of procedures are less easily 
accomplished than those used to resolve situations of perfect collinearity 
on account of the need for a priori information or tedious computation. 
What follows is a review of the procedures of both categories that have 


been suggested to date. 


3.1 Generalized Inverses 


One approach to the estimation of the linear model of less 
than full rank using generalized inverses has been discussed in Chapter I. 
In an alternative approach, the parameters are subjected to linear 


constraints of the form 
RB =c 


pHeteeeR is an on * (m-r) matrix of rank sand s 5s (mr) <m . 


Minimization of the constrained sum of squares leads to 
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equations of the form 


or 


(B] fX'y) 
4) | 2 | és 


where yY is a vector of Lagrange's multipliers. We have the following 


lemma 
Lemma 3.1.1. The g,—inverse of M is given by 
8, & 8, & 8 & 
K Toye TRI TRK - K TRIG : 
8 
ui = (16) 
S, 82 & g 
eee a eee = 


& 
where K = (X'X+R'R) and G = RK IR 


The derivation of Lemma 3.1.1 is based on the following 
results due to Bose [6] and Rao [47]. 


By 
(1) (i) X(X'X) “X'X = X 


aT 
(is) RIX (CX) ) xk eax" 


& 
Grit) AX ly = A iff A is contained in the row space of X. 
(eeLo@ieeisc al eq sk matrix such that the row space of H» is con= 


tained in the row space of the kx k matrix S = (hx then 


eB 
C1) r (HS mit) = r(H) where r(H) denotes the rank of H. 
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(ii) HS Ty is unique and positive definite. 
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Proof of Lemma 3.1.1. (Dwyer [13]). Let M _ be premultiplied by 
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thus the following relations hold 


rk tk =R 
kx AR" = R' 
xk IK = X 
KK Ix! = x" 


It follows from (17), (18) and (19) that 


K 
Pua = | 
0 -G 
and 
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ice 
M = a| s,}P 
0 -G 
with 
if 
P= g 
—-RK I-G 
=) 
By (17) and (1) and (ii)#6f82 ,{KG =BRK R" is unique, positive 
definite and has the same rank as R. It then follows from 1 (iii) 


that, since the row space of R' and G 
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R'G 16 = R' 
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are the same, 


On completion of the reduction, M is obtained as given in (16). 
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It is evident that a solution to equation (15) is given by 


aT 8, & 


~ 8 & & 
B= (K. - K.~R'G 


Thx thy + K 1 


R'c le. (21) 


Rao [47] has shown that L'8 > WLth B as given in (21), 
is the best linear unbiased conditional estimate of an estimable 


function L'é . 


Plackett [45] considers the case where the restrictions are 
chosen in such a way that [X' R"] has the full rank m (R is thus 
complementary to X) and obtains the minimum variance conditionally 


unbiased estimator of 8 as 
BS ' Pee pre ' 
B = (X'X + R'R) “(X'y + R'c) . 


A further illustration of Plackett's solution in terms of 


g,—inverses is provided by Chipman [9]. Defining 


X 
z= [pl 
we have 
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Thus premultiplying (22) by X and postmultiplying by (x'x4R'R) Hx! ; 
eonditionsa#(1)) and GH)sof g,-inverse are verified respectively. 


Verification of condition (iii) is trivial. Performing similar 


operations on (x'x+R'R) AR! » we thus have 


re 8 g 
be eX 3, +R oA i 


3.2 Using Additional Data 


One proposed remedy involves the acquisition of additional 
data, which however may not always be available. In the fortunate case 
when the wsearcher has access to new data, an efficient criterion of 
selection which would best reduce the standard error of estimates has 
been suggested by Silvey [51]. He points out that for the parametric 
function L'8 to! be estimable, L must be a linear combination of 
eigenvectors of (X'X) corresponding to non-zero roots. L can thus 


be written in the form 


= a @ Oo + e 
L Ral + 5 Vo O5¥5 
where Vv. are normalized eigenvectors of (X'X) . Utilizing the 

a 
fact that 
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which implies that the bigger any 4 suihe Smaller is "its contribution 
to the variance, Silvey is thus able to show that precise estimation is 
possible in the direction of eigenvectors corresponding to large 


eigenvalues. 
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. t rs ° ° ° 
Silvey's selection criterion is to assume an additional 


observation y is taken at the values x' _ = 
ae x eee 
ntl Fi a eahes Lares ae aes aa ac 
an el? of the independent variables, where x41 = re £ some non- 


zero number. 


The new model is 


or 
y, =~ X,8 tu. 
Then 
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so that V. is an eigenvector of 0. corresponding to root ue . 
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The eigenvectors (X'X) are thus those of X,X, and all eigenvalue 


; 4 2 
are the same except that Ae is now increased to (A, +2 ) . Therefore, 
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if the new independent variables are chosen in the direction of 


eigenvectors of (X'X) corresponding to small roots, the standard error 


would be reduced. 


In the case when the new observation x is not necessarily 


aul 
in the direction of an eigenvector of (X'X) , Silvey shows that the 
: ; 5 : 2 
optimum direction of s iti j = 
2) Xt? ubject to the condition Xt sat b 


is that of the vector tae 


x'x) ty , which holds for both singular 
or nonsingular X . This same result has been obtained by Gupta [21] 


in a simpler and more concise fashion. 


Researchers have also grappled with the problem of treating 
multicollinearity when additional data are not readily available. Of 
several possible remedies that have appeared, some have been more 
successful than others. The methods presently available will now be 


discussed. 


3.3 Incorporating Extraneous Information 


One procedure which has seen extensive use involves the 
incorporating of information extraneous to the sample followed by re- 
estimation of the regression equation. An investigator, for instance, 
may have knowledge of the ratio of some coefficients. Alternatively, 
the values of certain coefficients or their linear combinations may 
be known. This procedure of using extraneous information varies in 


form according to the type of information available. 


The method commonly employed by econometricians involves 
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combining time-series and cross-sectional samples. An example is seen 
in demand studies where income and prices in time-series data are usually 
collinear. Cross-sectional samples are then used for estimating the 


income coefficients. The procedure may be formulated in the following 


way. 


Suppose that we have estimates of (m-p) of the m elements 
of 8. Without loss of generality, we may renumber the X variables 
so that the estimated coefficients refer to the last (m-p) variables. 
The coefficients of the first p variables are then to be estimated. 


Consider the partitioned relationship 


y =sX28 ot X 8 el |. (23) 
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and 
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and from the assumption of independence of the two sets of data, the 
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One shortcoming of the above procedure has been attributed to 
the fact that cross-sectional data are by nature usually long-run, 
whereas annual time-series data are often short-run in character. As 
Kuh and Meyer [33] has pointed out, the combination of different structures 
to overcome multicollinearity is improper, and leads in fact to dis- 


crepancies in the estimation. 


As the second variant of the procedure of using extraneous 
information, suppose that the extreneous information consists of exact 


linear restrictions on the coefficients, 
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where h is a kx 1 known vector and H' an k xm _ known matrix 


of rank k<m. We have, therefore, k independent restrictions on 


the elements of 8. 


In incorporating this information, we utilize the method of 
Lagrange's multipliers to determine the estimator of 8 which minimize 
(y-X8)'(y-X8) subject to the restriction H'8 -h=0. It can easily 
be shown that the solution for the coefficient estimator under constraint 


is 

Sh ee ae eae ' Teta Deena 1A 

B = 6+ (X'X) H[H'(X'X) -H] (h-H'B) . (25) 
Substitution of 6 = 8 + (X'x)7/x'u into (25) yields 


e+ (x'x)extu + (x'x) Hat ox'x) ay + fh-w' 8-H (x'x) Txtu] . 
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Therefore, if H'8 = h is true, 8 would be unbiased. The variance- 


wk 
covariance matrix of 8 is given by 


* = 
V = here = V - VH(H'VH) Laity 
B B 


where V = ree = Eveéeer is the variance-covariance matrix of the 
BB 


ordinary least squares estimator 8. 
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This estimator 8 has been shown by Theil [52] to be the 
best linear unbiased estimator of 8 in the class of all unbiased 
estimators which are linear functions of y and h > provided H'S =eh 
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The third context involving use of extraneous information is 


a method which utilizes both the extraneous and sample information to 
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estimate ia and ee efficiently. We assume that the extraneous 


information may be represented by 


iS Rates: Greece (26) 


Combining (26) with the basic model, we have 


u O71 0 
E Jt dj' = | (27.) 
0 


where 


y = Edd'. 


The zero nature of the off-diagonal submatrices results from 
the assumption of independence between the sample ad prior information. 
Since the variance-covariance matrix (27) is not 671 , ordinary least 
squares cannot be used. An application of Aitken's generalized least 


Squares procedure leads to estimates 
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we have estimates of the coefficients of the last (m-p) variables 
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B has the property of being the best linear unbiased 


estimator "best'' refers to both extraneous and sample information. The 


variance-covariance matrix of this estimator is given by 


var (3°) = gy 0s + ie mae : 
og 


A shortcoming of this method lies in the fact that knowledge 
of oa and w is required. This difficulty can be circumvented by 
employing unbiased estimators of these variances and covariances. Theil 
[53], in search of a heuristic procedure, has suggested the following 


conditional estimator of 8 
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where 


2 - -1 
s = y'[I - X(X'X) Ty) X'y/n-m . 

Alternatively, the investigator may have knowledge about the 
bounds on the values of some coefficients. Suppose it is known a priori 


that the coefficient 8 lies between 0 and 1 , probably between = 


1 4 
and 2. This knowledge can be formulated as 
Be 
with 
Ed = 0 
2 1 
Ed = 16 
; a 3 ; 
BOmtiat es 280 gives a vanee Lroms = toe sands 6.1 26 gives 
1 By 4 4 1 By 
a range from QO to i. The procedure described in the preceding 


paragraph is then applicable to obtain the best linear estimator. In 


this case, we set 
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Knowledge about linear combinations of coefficients may also 


be handled by a similar procedure to the above. 


Finally, we may have situations in which the extraneous 


information take on the form 
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Heetaihe. 


In this event, the familiar technique of quadratic programming 
is then applied to minimize the sum of squared deviations (y-X8)'(y-X8) 


subject to the constraints. 


We note in conclusion that the utilization of extraneous 
information, as a procedure of resolving multicollinearity, leads to 
better estimates. One shortcoming though may be said to reside in the 
fact that a priori information often consists of facts or relationships 
adduced from expirical economic studies. The validity of this information 
is therefore always a problem to be faced. Fox [17] has in fact quite 
strongly stated that "if we use purely arbitrary coefficients to get 
around a statistical impasse, we deserve criticism from both economists 


and statisticians". 


Be4 The Mean Square Error Criterion 


It has been seen in the preceding section that restrictions on 
the regression model result in reduction of the variances of the 
regression estimates, though the restricted estimator will be biased if 
the restriction is not exactly true. Testing procedures have been 
devised for rejecting or adopting restrictions on the parameter space in 


a regression model. 


The classical procedure for testing the validity of the 
Poatriction H'e = h , where H is m* k of rank k, has been the 


Snedecor F test which can be shown to be uniformly most powerful (U.M.P.). 
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It is obtained via the statistic 


“ SSE(2-) - SSE(A) . SSE(B) 
k ie 

where SSE(B ) is the error sum of squares in the least squares 
regression constrained by the hypothesis and SSE(B) is the error sum 
of squares in the ordinary least squares (0.%.s.) regression. Under 
the null hypothesis, H'S=h, z has the central F distribution with 
k and n-m degrees of freedom. Consequently, we are able to employ 
the F test to choose between two sets of estimators, as for example, 


a variable is dropped when found insignificant using the F test. 


A number of disadvantages arise however, from using this test, 
as Wallace [59] has recently reiterated. Most importantly, the validity 
of H'8 = h constitutes an "overstrong" criterion, and even in cases 
where multicollinearity is severe, it would still seem reasonable to 
trade some bias for a smaller variance of the estimator. As an alter- 
native which better captures the "notion of tradeoff between bias and 
variance", Toro-Vizcarrondo and Wallace [56] have proposed a testing 


procedure based on the Mean Square Error criterion. 
The mean square error for an estimator is the estimator's 


variance plus the square of its bias. Let 98 be an mx* 1 vector 


of estimates, then 
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Between two alternative mx 1 vector estimators, 6 and 


8 , © is said to be better in the Mean Square Error (MSE) sense if 


for any mx 1 vector d #0 


MSE d'6 < MSE d'é . 


This inequality is equivalent to the requirement 


E(6-6) (8-6)' - E(6-6) (6-8)' = a positive semi-definite matrix . 


Thus, in the linear regression framework, we question whether 


Ms Egeeioho ad Se aepositivessemi-derinite matrix. (28) 


BB BB 
wk 4 
Piette 1 Sas0.. ob is the better estimator by the MSE criterion. 


Toro-Vizcarrondo and Wallace derive the condition for (28) to 


hold as 


(a! B-h)' (H" (x'X) 7H) 1 (H'g-h) ae 
De — 
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; (29) 
Dividing both sides of (29) by 2 and denoting the left hand side so 
obtained by 2 , (29) can be restated as 
at 
et 


~w* “A 
In other words, in testing whether 8 is better than 8 according to 


the mean square error criterion, the hypothesis of interest is 
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They further show that the statistic z has a non-central 


F distribution with parameters k, n-m, A. Under the null 


hypothesis H'B =h or equivalently X= 0, z has the central F 


distribution as stated earlier. 


Applying a theorem from Lehmann [34] and making the trans- 


formation 


w= aaa kz 
n-m 


they obtain a U.M.P. test for the MSE criterion 
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Ww stands for the computed value of w and the critical point we 


is determined by 
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if h, (w) dw =l-a 


2 


45. 


(30) 


where h, (w) is the density function of w which can be easily derived 


from the non-central density of z as 
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h, (w) can be recognized as the beta distribution for A = 0 and the 


non-central beta distribution for A>O. 


(31) 


eke Oe KS cee 


ain s obgatinsa 9 nla. 
Huo a3 abov K sh @ ri a am 
‘ ‘Geesioes: aris ani «x .0 = 4 yionelevis pans 
a abies ‘bis 


“sense elo gutadeo bre (+f) onsordst mos? movosid 5 garehqqa 


so ; 
at w | va 
a 


nokissinn 3eM of3 10} Je99) -TMU atexdo . 
| - _ 


me 
= 


i. re pie ie 
yp. SA ty Gentege” — > KOH 


x 
iW 7? Ww Tt tg aJaeORA 

* Ve 
— ge wat ipl. s98tan 


g dato fanbdtr5 sii baa Ww Zo-eulay bosques sia 263 


‘ we 

n 

\ : go - [-¢4 wb (Mt) ptf | s 
rt 0 


buviteh yllens od aes isidw w to noigoqua eatenat oe at 


“ mime | TO amtsadeagelt 36d ay a 


_ 


46. 


Since multicollinearity is closely linked to zero restrictions- 


dropping a variable or set of variables, (30) can be regarded as a U.M.P. 


test for "multicollinearity". We delete the set of variable x 


(n X m-r) from our partitioned model 


= + 
y ee eer ee oe 


when we consider Xa. to be multicollinear with xX (nie) ewe, 


when the null hypothesis, A <¢ is accepted. 


In essence then the Mean Square criterion test takes into 
account both the bias and the variance, rendering an operational 
advantage over the standard F test mentioned at the beginning of 


the discussion. 


Two alternative but weaker criteria of the Mean Square Error 
have been recently developed by Wallace [59]. He refers to them as 
the First Weak Mean Square Error and the Second Weak Mean Square Error 


criterion. 
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According to the lst weak criterion, 8 is better in 


average squared distance if 
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and ne is the smallest eigenvalue of (X'X) 
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The 2nd weak MSE criterion is a test of the betterness of the 
e 2 ~k 
restricted over the unrestricted estimator of Xf = E(y|X) AP mets 


said to be the better estimator of E(y |X) in weak mean squared error 


iff 
wk ~* “a “a 
E(X8-XB )'(X8-XB ) < E(X8-X6)' (XB-XB) 


or equivalently 


E(B -8)'X'X(B -8) < E(B-B)'X'X(B-8) . (33) 


A necessary and sufficient condition for (33) to hold is 


ds 
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To recapitulate the gist of the foregoing discussion, it may 
simply be stated that average squared distance criteria for linear 
restrictions in regression yield operational tests more appropriate 


for deciding the exclusion of variables in the event of multicollinearity. 


3.5 Principal Component Estimators 


The initial impetus to the use of principal component estimators 
in situations of multicollinearity was provided by Kendall [29]. Suppose 
we have a matrix X of n observations on m variables, where the 
observations are expressed in deviation form from the sample means, the 
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principal components of X are the artificial variables Z rZa yee Ha 
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which are linear combinations of the X; "s so chosen that the variance 


of 2 is a maximum, the wariance of Zo is a maximum subject to the 


condition that Z, is orthogonal to Z and so forth. Let a be 


an m-component column vector such that a'a = 1. The variance of Xa 
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is 
2 ms fyt 
(Xa)7 = 'a'X'xa%% (34) 


In finding a normalized ei "= a. 
n nding normalized elgenvector ay (ay7°#19. a) 


which maximizes (34), we seek the solution to the equation 


\ - aS 
(X'X 4D 0 


where eT is the largest eigenvalue of (X'X) . It can be Seen that ay 


is the eigenvector of (X'X) corresponding to the eigenvalue hy : 
zy = Xa, then constitutes the first principal component. The second 
principal component, Z, = Xa, » where a, is the eigenvector 

corresponding to the second largest eigenvalue of (X'X) , is found by 


maximizing the objective function 


b> =a'X'Xa —- y(a'a — 1) - u(a'a,) 


Proceeding in this manner, we obtain all m principal com- 


ponents of X_ given by 


ZL = XA. 3.5) 


Thus it turns out that A is an orthogonal matrix and is com- 
posed of normalized eigenvectors a, corresponding to decreasing eigen- 


values A. of (X'X) . The matrix of eigenvalues 
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satisfies 


To explain how principal component analysis resolves 
multicollinearity, let us suppose that the analysis have been applied 
to the variables Xx, - The regression model can then be written in 


terms of the components as 


XB + u 


< 
i} 


Il 


ZA'B + u 


ZA +u 


where A= A'B. 


In this case, the Gauss Markov theorem is applicable and the 


least squares estimator A of A is then obtained. We have, from (36) 


B= AA. (37) 


In Kendall's view, a better estimator of 8 than the ordinary 
least squares (0.%.s) estimator is afforded by deleting from A _ those 
components corresponding to small eigenvalues. The estimator so obtained 


* 
is referred to as the principal component estimator denoted by b . In 


symbols, 


beaea A (38) 


% s 2 : s 
where A = AA and A is a diagonal matrix with 6 (a binary 
m-component vector, each element is either 0 or 1) down the principal 


x 
diagonal. It can be shown that b is distributed as 
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, Ave , * es 
The justification for the betterness of b over £8 is 
* e * 
evident from the following: Let b, (h) represent the principal com- 
ponent estimate of Be obtained by deletion of the component h. 


- * 
Var BS then exceeds var b, (h) by the amount 6748 


ih’ *h which is 


necessarily positive. 


The desirability of using a principal component estimator 
rather than an o.%.s. estimator has been further delineated by McCallum 
[40]. In essence, his proposal entails adopting the Mean Square Error 
as a criterion for selection of the components. More specifically, the 
principal component estimate b, of a single parameter is better than 


the o.%.s. estimate B. by the Mean Square Error criterion if 


* A 
MSE (b5) < MSE (B.) . 


A component is therefore deleted if its exclusion reduces 


the mean square error of estimating Bs - Since 


* * ’ * ; * , 
MSE (b, (h)) = var b, (h) + bias b, (h) *bias b, (h) 


2 
Car 13 * 
Sth Sey aca (Eb. (h)-8,) (Eb, (h)-8.)' (39) 
A rs uk ak aL Gh 
j#h ij 
and 
Wee Ts 
bias b.(h) = a..A, - a ae 
au jdh AN ae y Oe 4 
= aA, 
m 
a Se ) sea (from (36)) (40) 
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we see that the Mean Square Error criterion per se is not operational. 

In (39), both the true value of BS » which we want to estimate, and 

the o , are unknown. Moreover, (40) tells us that the magnitude of 

the bias depends upon the true values of By Bo s+++s8. . Fortunately, 
though, this difficulty may be overcome if a priori knowledge of relative 
parameter magnitude or their estimates is available. Such information 


* 
will indicate situations in which by is better than Bs according to 


the Mean Square Error criterion. 


Farebrother [14] has extended McCallum's analysis to the 
general criterion of minimizing a weighted sum of elements in the 
equation 


* * * 
MSM) = 9 (be 6) (beh) eae (41) 
The off-diagonal elements of (41) 
bebe) 
eed) ody 
* * * * 
are referred to as the mean product error of by and Me or MPE(b, 5 ba 


The minimum weighted mean square error (MWMSE) criterion 
seeks to minimize the function 
m =m 


2 eS * 
po ee NPE (bob y= to Moke (be) eae (42) 
enh api 2 ee 


where F is an mx* m matrix of fixed weights. 
Since it follows from (39) and (40) that 
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tr MSE (b') -F=trvarb °F +tr (A-A )A:A'(A-A ) -F 


H * shale — 
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of which the first term on the right side is constant, Farebrother is 


able to show that (42) is equivalent to minimizing 


Pr (AN en Ay (hen SEC 


* * e * e 
The principal component estimator b is thus better than the 


o.%.s. estimator 8 by the MWMSE criterion if 


sree CC mere Nay CS CN eer es (43) 


Deletion of a component is therefore desirable if its exclusion reduces 


the left side of (43). 


The work of McCallum and Farebrother described above has been 
concerned with finding a better estimate of 8 . Recently, Mitchell [41] 
chose instead to focus on obtaining a good estimate of X88 . Adopting 
the 2nd weak MSE criterion, he suggests minimizing the “average pre- 


* 
diction mean square error APMSE" of Xb as an estimate of Xf 


Bh 


* * ce 
APMSE(Xb_ ) E(Xb -X8)'(Xb -XB) 


* 
= Eres Eb) ax 


which is (42) with F== X'X. 


5B |r 


As a practical illustration of the method, we use once again 
our French economy data (Chapter I). On applying a principal component 
analysis to the sample correlations matrix using CEIGS [8], results 


shown in Table IV and V are obtained. 
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It is seen that the first two components account for nearly 


all of the total variability. Given the small value of ry » the 


contribution of the third component can be neglected. 


TABLE IV 


Eigenvalues of Correlation Matrix for 


the French Economy Data 


Component Eigenvalue Percentage of Variability 
i 2.08388 69.46 
2 O291505 30.50 


0.00107 0.04 


We thus have 


Z = 0. 68104X, cf 0.26960X, te 0.68081X, 


N 
i} 


D 0.18971X, = 0.96297X, + 0.19156X., ° 


TABLE V 


Normalized Eigenvectors for the First Two Components 


Variable 1 2 
X) 0.68104 0.18971 
X, 0.26960 -0.96297 


0.68081 
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Since the correlation of y with the xX 's are respectively 


0.98418, 0.26591 and 0.98477 


; 1 
Ay = dr0a799 [(0-68104) (0.98418)+(0.26960) (0.26591)+(0.68081) (0.98477) ] 


0.67777 


_ 1 
A, = 9.91505 [(0-18971) (0.98418)+(-0.96297) (0.26591)+(0.19156) (0.98477) ] 


0.13036 . 


From equation (38), 


0.68104 0.18971) ;0.67777 
b=" |0°526960 —-0.96297 


0.68081 O229156--0513036 


0.48632 
= a.0. 05720 


0.48640 


The regression equation may now be re-expressed in terms of 


the standardized variables, 


va 0.48632xX, te 0.05720X, =F 0.48640X,, - (44) 


In accordance with the theory discussed above, the estimates 
* . 
b. of 8, are biased but have smaller variance then the o.%&.s. 
a ne 
estimator. (44) is thus the regression equation for our French economy 


data corrected for multicollinearity. 
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Our discussion in this section has therefore elucidated the 


ae 


utility of the method of principal component in achieving orthogonization 


of the regression calculation. In addition, the method has the desir- 
able feature of involving a minimum of assumptions and, given the 
availability of electronic computers, it is basically easy to apply. 
Two limitations to the procedure however exist, the first being that 

it works only for linear models. Secondly, in those situations where 

a priori information about the grouping of the variables is available, 


other procedures are more appropriately employed. 


3.6 Factor Analysis 


In a 1966 paper, Scott [50] proposes applying the well known 
procedure of factor analysis to resolve multicollinearity in regression 
analysis. Essentially his method involves adding the correlation 
coefficients between the dependent variable and the independent 
variable to the correlation matrix. Applying factor analysis to the 
augmented matrix, Scott then derives the appropriate regression 
coefficients from the factors obtained. To understand his procedure, 


a brief look at the factor model is necessary. 


The factor model can be expressed briefly as 


xe= BE tc (45) 


where 


x is an mx 1 vector of m standardized variables. 


Bemis the ms p Matrix of factor loadings,” p <m . 
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f is a p x 1 vector of factors. 


€ is the mx 1 error vector which is distributed 
independently of f£ . Both f and ec have multivariate 


normal distributions. 

E(€)°=,0 and E(f) = 0 

E(ee')=V , a diagonal matrix 

E(ff"') = I , i.e. the factors are uncorrelated and with 


unit variance. 


It then follows that the covariance matrix of x is given by 


i = BB? y . (46) 


A number of different methods exist for determining the 
matrix B , including the method of principal factor solution, maximum 
likelihood method, Whittle least squares method, canonical factor 


analysis and Joreskog method. 


Following determination of the matrix B , the factors can be 


obtained in at least three different forms 
t aon 1 
(3) B)eaalx —sr (47) 


BT xe=f (48) 
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Scott derives from the factor model a stochastic linear 
equation called factor analysis regression which can be used in place 


of the least squares regression when multicollinearity is present. 


Assuming x) is the dependent variable and that N = ee - 


wesiaverrrom (46). = Nx <= Substitutime Nx for £ in (45), Scott 


then derives by simple algebraic manipulation the factor analysis 


regression equation as 
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Let be be a row vector of the matrix B and n; be a 


column vector of matrix N . Equation (50) then assumes a simpler 
form 
i] ! ' 
= 1g ts ae xX, + ec ciee X 
yy z , =. ' x. ! . 
Fay bin, 2 D bin, 2 1 bin, m (51) 


Putting W = BB} * , the factor analysis equation reduces to 


where w.. is the element of W in the i-th row and j-th colum. 
1j 


In general, any one of the variables may be the dependent 


variable. Suppose the i-th variable is selected as the dependent 


variable 
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where all variables except X, appear on the right. 


If the factor solution given in (47) is employed, the linear 


model derived by assuming W = B(B'B) “B' will be 


Wid Ww ‘ Ww. 
n= se Ret ey Sage 0 
Hosa Bh Bec pel oh — 
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where all variables except Xx; appear on the right. 


On the other hand, if we use (49) and denote 


B(I + Biv tpy tary by W , we obtain the linear model 


aL ay im 
“x, = A xX, + Ke Se a Xx 
i < li A 2 a m 
‘—_ 1-w.. ew 
pa & 5 A = ii 


As before, all variables except X; appear on the right. 


To conclude, we note with interest Scott's recommendation 
that stochastic linear equations derived from factor analysis are 
especially appropriate for economic data involving high multicollinearity 
or errors in the variables. The rationale is that the coefficients so 
obtained are better from the view point of "their economic meaning and 
theoretical expectation" than those estimated by traditional least 
squares. Thus, given the availability nowadays of electronic computer 
for the iterative-type calculation needed, the factor analysis 


regression may well see more use in econometrics than has hereto occurred. 
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3.7 Ridge Analysis 


In all the preceding procedures for resolving multi- 
collinearity, the estimator of 8 is the least Squares vector B : 
In 1970, Hoerl and Kennard [24] published an alternative method of 
ak 


estimation known as ridge regression in which a biased estimator 8 


is introduced, namely, 
~ pac! ek LS 
Bei =a (Xe kote) ey. 
8 is related to the least squares estimator in the form 


ark = A 
a en (ip ieedeey oP a - 


AK Ak arn ak _ 
Ingaddition, tor, k:*# 0, p "8B ES Soas. cles s Ts, shoccenecnans no. 


Basically, the idea of ridge regression is that when a small 
positive number k is added to the diagonal elements of X'X s the 
instability of the estimates is lowered. This can be seen from the 
fact that if dy is the eigenvalue of X'X >» then Ir, Tio Se Coe 
eigenvalue of [x'X oF aeip . More specifically, Hoerl and Kennard 
have shown that choice of an optimum k can, in fact, reduce the 
variance and lead to a minimum value of the mean square error of the 


estimate of 8. 


The optimal value of k is manifested by a number of 
simultaneous conditions, namely, the stabilization of the estimates, 
the disappearance of unreasonably large absolute value of the 
coefficients, the correction of wrong signs of the coefficients and 


the reduction of unreasonably large value of the residual sum of 
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squares. To detect this optimal k , Hoerl and Kennard utilize Ridge- 
Trace which is a two-dimensional plot of B* (k) and the residual sum 
of squares for the number of values of k in the interval [Odd] ein 
sum, Ridge-Trace reflects the complex interrelationships existing 
among the non-orthogonal independent variables and the effect of these 


interrelationships on the estimate of 8B. 


As the authors of this technique have pointed out, Ridge 
Regression presents two advantages over procedures such as principal 
components and zero restrictions which do not portray how multi- 
collinearity is actually causing instability, over-estimations and 
incorrect signs. In addition, "they can actually amplify the deficiencies 
of ordinary least squares for non-orthogonal data". We note, of course, 
the presence of subjectivity in interpretation of the Ridge-Trace to 


obtain the optimum k. 


Most recently, Conniffe and Stone [12] have made some critical 
comments on Ridge Regression. Their principal contention is that 
Hoerl and Kennard's proof that the mean square error of a" is less 
than that of B for certainsvalues Orwek 915) validvonly 1f (k>iis 
assumed known. The ridge procedure however involves the estimation of 
k . Mayer and Willke [39] confirming this oversight of Hoerl and 
Kennard, have listed two resultant weaknesses in Ridge Analysis. First, 
it is not possible to state with absolute certainty that the estimator 
chosen has smaller total mean square error than the variance of the 


least squares estimator. Secondly, the moments of 8 obtained for 


fixed k are not the moments of the estimator being used. 
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Conniffe and Stone also argue, quite correctly, that Hoerl 
and Kennard provide no proof that the appropriate value of k can be 
recognized by the four criteria discussed previously. Their additional 
comment is that the second and third criteria entail a lot of prior 
knowledge which they claim a researcher rarely has. In this regard, 
it might be countered that in economic situations, at least, theory 
has developed to a stage where the true nature of the variables' 
intercorrelation is known. For example, one would expect the coeffi- 
cient of the rate of change in wages to have positive sign in a price 


and wage change relationship. 


A third point of Conniffe and Stone relates to the 
stabilization criteria. By showing that even if the X, Pseace 
orthogonal, B values would change more slowly with increasing k , 
they concluded that the tendency towards stability is not a consequence 
of the ill-conditioned (X'X) . The final critical comment refers to 
thegfact thateif (X'X) is singular, ordinary least squares estimator 
B does not exist. However, since B = (X'X + any ee and 
(x'X + kI) is non-singular, 3% does exist and their values are 
non-sensical. However, Conniffe and Stone's argument overlooks the 


a 


ak 
relationship existing between 8 and §8 , namely 


Ae ia seer yee 


which involves also the inverse of (X'X) . In sum, taking into 
account the weaknesses of their critic, Conniffe and Stone's conclusion 
that "We believe ridge estimators are unlikely to be of practical use 


to the researcher with data to analyse" would seem slightly over-strong. 


” , go0T2a~Teve 


Laculsthbs shen? |. uw 
Sad , 


’ 


PY ¢ q. 
tebty To jol & Licins 


ena ane al eed 9 


a 7 7 = = : 7 
- - 7 = () 
Pars = 2 
e Bln : 
am = Ae: 
7 _ nf ) 
fesot jade ,yisaetsws gtlis , 
7 s% @ 
ed ano 2 Yo owlev szairqerqg 
plevolvesg bs 


eteotits brt4s bas Brosse oft Jedd a? gnemm 
7 


suyis o81f Siete 


g of tha ieotg om ehivot¢g bream) bo 
_ 7 
ugetb abyeciso tot oft vd bas NOD: ? 


> 


hrs. ethno : 


- 


ary 


7 a 
ai 


t 


BSeh: 6 miotio 


Yao7 doidw sgbs (woo! 
_ 


f : : —_— ao hats 102 d tes hers at 
7 . sant*nutke vino rs ard bets of A cs 
yroads .gassI gn ,enotsavike simondas nt 36m 5 Pelt eS 
4 
7 - - : 7 ab ~ ,. 
‘asl di ivav ants to Ssiuden sist « fy srodyv ssA5e & a7 bsqalsvsh a mn 
- i? 
7 = - ao 
a 4 ~ Tr ‘Twin j 2 1¢ § 
\ -lTis6> sd? i209) bfirow O «Se . om .fmwonn &4 noktels1Ieo%s @ 
7 a 
eotyvq & al ogie eviiureoq evan 93 yaw of sii5 to 9387 Sag 30 3a [> 
.¢iieootzsiss squade egew bas 
5, 7 ; 
di? oF getalset 20032 iw sitinae) ?o mnfod DTEdG a 
- —— 
$16 : a be J b a 873i 79 soi saakinda 2 
. : ae 
7 ne : oe eau 
« & gatesesoat dobry vieol= st she (Ow eouts £ * Ienogeds706 
; ; a 
7 
R + P] ’ P - ’ 
sansupreysos s Jon Vif . yAWOS 72abH sit zgert bebulsaep Vers 
7 ‘ ; —_ 7 
@4 etai19% jn mo fi 5 Lsati- os 4 Kx) aot ft vy~L I} eld io 
3 . bap% + . a 
WcJHmi ISH |e rsUup 7 1 io . ; 4 ( ) tt 2sitfs goed soda 
P 7 - Fr" 7 b 7 : . 4 
Of + { 4 ) = ; , Tava HH .SaLR25 tom as0b 8 
a 
toulav thoi tus sa0b telesoke-non ak iad + 27RP 
« : wae 
7 ’ 
ofa edoolisve in ge J ) } -tsvewoH «.Iepke 
f 7, hr » os Wien Oe ‘ ie . aime 
. ae. & @ 2 6198 rRolieixns aiiemehjaeis 
a 
= °F om 
~ y pay ¢ , ‘ - 4 
“a ( \fe au) r LL) a oe 
otnt Retwes ymue mi -. (0°R) 30 setavnt sid onfe eviews 
notevlones a‘ ecode bon i -, ial sista? ic eden’ 
ix 1p ae =] > cri apart ¢2442319 TiShi oO -32Roml tow sft 4 
ser [ko l32674 7 1 oF Visa) fru 8th & IHAbolIes Sgbis oveb a6. 
2 - 7 > 


efadalie esse blimw 
mo) 


7 


"oa Lari of jed eb ES 


ijiw sefotkee 
= 


62. 


3.8 Marquardt Generalized Inverse Estimators 


Marquardt [36] has proposed the use of another class of 
biased estimators termed generalized inverse estimators which share 
many of the properties of ridge estimators, though they are more 


relevant when the matrix X is singular. 


Let X'X be diagonalized into its matrix of ordered 


eigenvalues by an orthogonal matrix J such that 


I'(X'X)J =D 


where 


Suppose X'X is of rank r so that the last (m-r) elements of D 


are zero, or nearly so. In the latter case, a criterion for determining 


the rank r is to preselect w in the range io to ie and then 


choosing the smallest r _ satisfying 


m 


ie 
mas a 


Trace D 


To obtain the inverse of (X'X) , J and D are both 
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: ; el 
Since DS rs20..; pes equals zero. The inverse (X'X) thus 
becomes 
~~ +t 1 
’ = ’ 
(xX X) . ro ee 
i 
™ ale, ' 
meee chile ED) 
izle i 


The class of generalized inverse estimators is then defined 
by 


a+ rw tein 
Bia, GOLX) ty 


is 18; 


~ et 
where (X'X) - is as. given sin (53). 


Marquardt indicates that the best r can be selected by 
examining the size of the variance inflation factor, which is defined 
as the diagonal element of Com for pre-assigned rank r , 
O<r<m. The criterion suggested is that the maximum variance 
inflation factor should be “usually larger than 1.0 but certainly not 


aoularge as 10. 


In proposing the generalized inverse estimator, Marquardt 
has emphasized with little reservation its superiority over o.¥%.s. 
estimator in non-orthogonal data. Nevertheless, it needs to be pointed 
out that the same sort of criticisms which have been levied by others 
against the ridge estimator, apply equally to the generalized inverse 
estimator. Thus Marquardt criterion for choosing the best r_ lacks 
precision, and, more importantly, he has not proved that ou has a 


smaller mean square error than the o.%.s. estimator. 
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3.9 Mayer-Willke Shemiken Estimator 


An alternative class of biased estimators, similar to the 
ridge estimators and labelled shrunken estimators, have been recently 


proposed by Mayer and Willke [39]. These estimators are defined by 


a 


2 Enea oye eae Vee teen) Lipa ae 


r 


Lie iseascalar, B is called a deterministically 


r 


shrunken estimator. But if . is a function of 8'8 9 By is 


referred to as a stochastically shrunken estimator. 


In their paper, Mayer and Willke outline a number of methods 


by which 2 can be selected. On approach involves putting 


We “Taoer sac sueyery (54) 


as the shrinkage factor where 


The shrunken estimator is then given by 


B, = [1 + SECU Y os (55) 


As Sclove [49] has proved, when the number of independent 
aT i 
Variables m * 3 , and O0= € < 2(m-2) (n-mt+2) , By has smaller 
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minimum weighted mean square error (MWMSE) than £8. Indeed, if 
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then 


MWMSE(8 ) = min MWMSE(8 


(Eo) E aCe)? 


Thus the class of stochastically shrunken estimator with A 
as defined in (54) is superior to the ridge estimators or deterministi- 
cally shrunken estimators, since a value of i can be determined 
which will guarantee a better estimator of 8 than the ordinary least 
squares estimator B » '"betterness" being in the sense of the Minimum 
Weighted Mean Square Error criterion. It must be stressed, however, 


that shrunken estimators with other values of iA face the same 


problems as ridge estimators because they involve the estimation of A . 


3.10 Multicollinearity in Two-stage Least-squares 


The techniques discussed thus far are designed to resolve 
multicollinearity when it occurs in ordinary least squares estimation. 
As Klein and Nakamura [31] have shown, two-stage least-squares 
estimations are even more sensitive to the presence of multicollinearity, 
and a remedy for such situations has been devised by Kloek and Mennes 
[32]. To understand their procedure, a brief discussion of two-stage 


least-squares will be necessary. 
In brief, the model concerned is 
= + 
y y,8 + Xo U 


where y is an n x 1 vector of observations on the "dependent" 


Variables. 
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Ys is an n xX 2 matrix of the other endogeneous variables present in 


the equation. x} is an n Xk matrix of observations on the 
predetermined variables appearing in the evuatvou,s lis ane ne 1 


vector of disturbances. 


Basically, two-stage least-squares estimation involves 


replacing ys by their least squares estimator ys » where 


Aa 


‘a 


X(XIX) XY, 


< 
i} 

= 
>< 


and X5 is an n x (K-k) matrix of predetermined variables not 


appearing in the equation. Next y is regressed on Y, and x, 


a 


to obtain two-stage least squares estimates a and 8 as 
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where V is the nx matrix of residuals from the least squares 


regression of ys on Xs. 


As is well known, difficulties arise when the number of 


predetermined variables exceeds the number of observations, or when 
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the number of degrees of freedom for the regressions is unsatisfactorily 


small. An attempted solution to these difficulties consists of 


replacing X5 by a small number of principal components. Unfortunately, 


as it often turns out, multicollinearity may exist between one or more 


of the components with some of the variables in x, 
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To resolve this impasse, Kloek and Mennes have suggested a 
number of alternative methods, each beginning with normalization of all 


predetermined variables. 


One alternative involves utilization of the principal 
components of the residual when x, is regressed on x) » the 


residual being 
2 Ge Shey ON 
58 Ee ek 


The principal components used are then given by 


where a is the eigenvector of E'E corresponding to the eigenvalue 


es Ae hols 


Multicollinearity is therefore avoided as the following 


argument demonstrates. 
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aS SE? 
= 1 2a 1 -ly, 
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[X}X, - X{Xp]o, 


= 0 4 = 1,..43K-k . 


A second alternative involves selecting those components 


with the greatest oe , defined as 
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where Rs is the multiple correlation coefficient when ie > the 
j-th principal component of X, » is regressed on xX, and ay is 


the j-th eigenvalue of (X,X,) 


Multicollinearity is resolved since 
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The right side of (56) is the residual sum of squares when 
Be is regressed on XxX) - In other words, the components chosen are 


those which are least correlated with X) : 
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CHAPTER IV 


AN EMPIRICAL STUDY OF MULTICOLLINEARITY 


Our discussion in the preceding chapters has presented an 
overview of the problems, detection and correction of multicollinearity 
in regression analysis. To illustrate the practical implications of 
the theory reviewed, we have empirically investigated multicollinearity 
in economic data related to inflation, a problem of considerable 
current interest and significance. In essence, we will attempt to 
apply the Farrar-Glauber techniques to the most recent data available 
concerning price changes in relation to the rate of change in wages 
and certain other contributing factors. This is followed by the 
construction of Hoerl and Kennard's Ridge Trace to visually demonstrate 
the harmful effects of multicollinearity in our sample of data. Finally, 
Mayer and Willke's shrunken estimator is calculated to remedy the 


detected multicollinearity. 


4.1 Description of the Model 


The economic model we employ is that found in the Special 
Study No. 5 published by the Economic Council of Canada in 1965. This 
study estimated the price change equation by fitting regressions to 
quarterly data over the period 1949 : 1 to 1965 : 2. For the purposes 
of our investigation, the same relationship is utilized for quarterly 


data which has been collected for the period 1959 : 1to 1972: 4. 
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We have the model 
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a SO pel alr Las eI ra 
where 
. ee oe 
Ee 75 ae 100 = percentage change in the Consumer Price 
ae Index between the current quarter and 
the same quarter of the preceding year. 
(DBS: 62-002). 
oS pee 
We we hpeeapem 100 = percentage change in average hourly 
t-4 ; : ' 
earning of production workers in 
manufacturing (DBS: 72-204). 
i as ee 
F. = aa * 100 = percentage change in the implicit deflator 
a for imports of goods and services in the 
National Accounts (DBS: 13-001). 
’ arse “itp 
Pus “Pa ROASE > at 100 = percentage rate of change in the U.S. 
i US 4 Consumer Price Index (Labour Review, 


U.S. Department of Labour). 


= the value of P. in the immediate preceding quarter 


= Be lagged two quarters. 
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4.2 Application of Farrar-Glauber technique 


Using least-squares analysis computer program MLREGR, the 


estimated relationship is obtained as 


P, = -0-44201 + 0.24887W, + 0.11055F, 
(2.99345) (1.91801) 
=0. 2 ee ty, Da 2G> 0. : 
0 waehGos. 0 87283P 0 09112P _, 
(=1..83583) (6.22947) (-0.59336) 
The squared multiple correlation coefficient 
2 
R 0.87400 . 
The matrix of simple correlation coefficient between the independent 
variables is 
tr 00000 @-0207353 0.89543 0.81324 0.83302 
=O207 392 1.00000 0.05934 0.02715 -0.04068 
G= } 0.89543 0.05934 1.00000 £0.735036 UG2 513 
0.81324 Oe O27:05 0.78036 1.00000 0.91583 
0.83302 -0.04068 ©.462513 0.91583 1.00000 
Since |C|= 0.00724 , substantial multicollinearity exists among the 
independent variables. 
“Jel is calculated and is equal to 
~[52 - 1 - £(15)](-4.9282) = 239.018 . 
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By regressing each independent variable on the remaining 
ones, we obtain the values of the multiple correlation coefficient 


2 
Ry and the associated F statistic as follows: 


i 

e e Pp e e 

P 
‘ihe nag at ret a2 
2 
Ry 0.850 0.127 0.839 0.854 0.876 
i 

F 66.583 1.709 61.231 68.295 83.008 


The coefficient of partial correlation between pairs of 


independent variables and associated t-ratio are calculated and shown 


in Table VI. 


TABLE VI 


Partial Correlation Coefficient Ey: and Associated a 


2 P 
between Pair of Variables with Ry on Diagonal 
i 


Wy, i “US, Bet Fs 
W, 0. 650wun 0028 6ueu0s69 ln 160.25400n 0.039 
i -2.046 Oni27me0 (3 01lemeenOe 207) 0,192 
pie 6.553 2.269 0.839 -0.116 0.303 
Pe 1800 1.483 -0.806 0.854 0.739 
P 0.268 1 367 2 G7memIe 162 0.076 
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In the eerie rd relationship (57), the coefficient of the 
variable Pus, is observed to have negative sign. This result seems 
unlikely as one could reasonably expect an upward drift in U.S. prices 
to be accompanied by a similar drift in Canadian prices. One might 


suspect multicollinearity is the cause of this phenomenon from the 


small value of |c| . 


F ' er, 2 cae 
The squared multiple correlation coefficient R indicates 
that 87% of the total variation in the Consumer Price Index can be 


explained by the regression equation. From the F values of the 


independent variables, one may deduce that ble iS Heal and P.-2 
are affected by multicollinearity. Indeed, Table VI shows that a 
linkage exists between We and ae and in another instance between 
(Ee 
P A 
Pel and 7 


4.3 Ridge Analysis of the Data 


Figures 2 and 3 represent the Ridge Trace that have been 


obtained by applying Ridge Regression to our set of economic data. 


Apparent from the Ridge Trace constructed are the following 


results: 


(1) over-estimation of the coefficients of all the variables when 


using the least squares estimator is clearly evident. 


(2)7 it is seen that when k= 0 , the coefficients of the variables 


P D and ae have negative signs which move quickly to zero upon 
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FIGURE 2 


Ridge Trace for Inflation Data 
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the addition of k > 0 and subsequently become positive. From this, 
we deduce that these two coefficients have the wrong signs in the 


original estimated relationship. 


(3) instability characterizes the coefficients of variables Wes Fie 


Peel and P=) » indicating the presence of multicollinearity among 
these variables. We note that the same result is obtained using 


Farrar-Glauber's technique. 


(4) the stabilization of the system is observed to occur at a value 
of k in the interval (0.5, 0.7) . Coefficients obtained by employing’ 
such a value of k , according to Hoerl and Kennard, affords more 


stable prediction than the least squares estimator. 


4.4 Calculation of Mayer-Willke Shrunken Estimator 


Following the procedure of Mayer and Willke, the shrunken 


-1 
estimator for our set of data is calculated (with §& = (m-2)(n-mt+2) -) 


; 0.40845 

0.10838 

B = |-0.24158 
ACE) 

0.86099 


-0.08891 


It is observed that the shrunken estimator obtained does 
not correspond to the ridge estimator calculated using a k value 


in the range (0.5, 0.7) . As we recall, the shrunken estimator has 
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been shown by Sclove to have smaller minimum weighted mean square error 
than the least squares estimator. On the other hand, owing to lack 

of rigorous proof, the superiority of the ridge estimator over least 
Squares estimator is still an issue under debate. Pending the 
resolution of this controversy, it would seem reasonable therefore to 
employ the shrunken estimator rather than the ridge estimator as a 


remedy for multicollinearity in our set of data. 
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APPENDIX I 


INVERTING ILL-CONDITIONED MATRICES 


A well known difficulty in solving least-squares equation 
stems from the need to invert X , matrices which are often ill- 
conditioned. To circumvent this difficulty, a common procedure lies 


in the following process of iterative refinement proposed by Wilkinson 


[61]. The sequence of vectors a fs), Spor0,16 25. ederinedabpy 
(OVE RomeKedawl of 5 (2) 


gece mest, (et g(stl) _ als) 4 93(s) 


(s) 


is computed. In the computation of the residual r » double 
precision accumulation of inner products is employed. All other steps 


are carried out with single precision. 


Recent years have seen the development of various algorithms 
aimed at obtaining more accurate solutions. Some of the more successful 
methods are those of Businger [7], Martin, Peters and Wilkinson [37, 38], 
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Businger and Golub's procedure employs orthogonal House- 
holder transformation. Since length is invariant under orthogonal 


transformation, 
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the least squares problem reduces to that of minimizing lloy-ox8|| . 
Q is chosen in such a way that 
Roi eek 
QXy= {T=1) 
| Of; } n x (m-k) 
where R is an upper triangular matrix. The decomposition in (I-]) 
can be accomplished efficiently by the Householder transformation [26] 


and clearly, 


where Qy is the first k components of Qy . 


Once an initial solution has been obtained, it may be 
improved to considerable accuracy by the process of iterative refine- 
ment. Iteration is continued as long as improved estimates of 8 can 
be obtained. The iterative technique should be used only if the initial 
approximation is sufficiently accurate, otherwise the iteration will 


not converge. 


The method of Martin, Peters and Wilkinson decomposes the 
symmetric, positive definite matrix X into LL' , where L is a 
non-singular lower triangular matrix. The elements of L are 
obtained by the Cholesky decomposition [12] and then used to solve 


the least squares solution. Since 
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In each iteration of the refinement process, 8 is improved by a 
, ats : ; P 
correction 64 ) that is determined using the computed LL' 


factorization. Ground rules for iteration are again as laid down 


earlier. 


Bauer has formulated an ALGOL procedure in which X is 
decomposed into GDB, where G consists of orthogonal non-zero 
columns, D = (c'c) + » and B is upper triangular. The condition 


G' (y-XB) = Q yields the triangular system 
BB = Gly 
which is then solved by back-substitution. 


The procedure devised by Bjérck requires decomposing X 
into 


xX = VC 


where C is unit upper triangular, and V'V is diagonal. To 
accomplish the decomposition, Bjérck uses a modification of the Gram- 
Schmidt orthogonization process. This differs from the classical 
process in that the elements of C are computed one row instead of 


one column at a time. Once the decomposition is realized, the least 
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squares solution is given by 


Iterative refinement is then carried out in the usual way. 


Bjorck has also proposed utilizing the fact that the residual 
r is orthogonal to the columns of X. His procedure therefore 


considers the augmented system 


In his three-stage iterative refinement procedure, Bjérck 


begins by computing the residuals 
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In conclusion, ie needs only be said that the ence procedures 
attain their objective highly satisfactorily. Either inversion of the 
ill-conditioned matrix is achieved to working accuracy, or the system 
found too ill-conditioned to be solved without working to higher 
precision. No attempt, however, has yet been made to compare the 
procedures with respect to computer time required, applicability, 
storage requirements or program output. A line of numerical analysis 


research may well be fruitfully pursued towards such a comparison. 
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APPENDIX II 


PROGRAM A 


PROGRAM TO CALCULATE MATRIX U OF KIRCHDORFER PROCEDURE 
DEMENS TON X12(20) .D11 (203; X13 (20) , E12 (20) ,SOR(20) ,D12(20) ,F13(20), 
fECeCsO)s SrL2 0) 

READ (S210) aL 

FORMAT (12) 

DO LOSTI=1-L 

READ( eZ UO) MXIT CTL) Xi2( LL). xT3(IL) 

FORMAT (3F6. 3) 

CONTINUE 

U00=SQRT(1. *L) 

D1I0=1. /U00 

CATA SUMN (DLO, X11.1;U01) 

CALL SUMN (DIO,XI2,L,U02) 

CALL SUMN (DIO,X1I3,L,U03) 

DOe2, 12=1,L 

EI1(12)=X1I1(12)-U01*D10 

CALL SUMM (EI1,EI1,L,SSS) 

U11=SQRT (SSS) 

DO 4 I4=1,L 

Dima yee (14) / 01 

CALI SUM (DEL, Xi2sh,Ul2) 

GCAVIN SUMM (DIT, X13,L,U13) 

SUM=0. 

Poel =e. 

E12 (1)=X1I2(1)-U02*DI0-U12*DI1 (1) 
SQR(1)=EI2 (1) **2 

SUM=SUM+SQR(I) 

U22=SQRT (SUM) 

DO 40/J=1L51L 

Di2G)=E£12 GI) /u22 

CALI SUMM (D112, 3435 l,023) 

SS=0. 

DO 60 M=1,L 

EI3(M)=X13 (M)-U03*DI0-U13*DI1 (M) -U23*DI12 (M) 
SS=SS+E13 (M) **2 

U33=SQRT (SS) 

WRITE(6,123) 

FORMAT OX BLIviok Ol lok hie ,lox, Die ,12X,.E13i/7) 
DO 70 N=1,L 

WRITE(6,99) EI1(N) ,DI1(N) ,EI2(N) ,DI2(N) ,EI3(N) 
FORMAT.C1 2", 5F1I5.5) 

CONTINUE 

U10=0. 

U20=0. 

U21=0. 
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1234 


U30=0. 

U31=0: 

U32=0. 

WRITE(6,999) 

FORMAT (Li) MATRIX U IS 
WRITE(6,991) 

WRITE(6,991) U00,U01,U02,U03 
FORMAT(' ',4F10.5//) 
WRITE(6,992) U10,U11,U12 ,U13 
FORMAT(' ',4F10.5//) 
WRITE(6,993) U20,U21,U22,U23 
FORMAT(' ', 4F10.5//) 
WRITE(6,994) U30,U31,U32,U33 
FORMAT(' ',4F10.5//) 
WRITE(6,1234) 

FORMAT ('1') 

STOP 

END 


SUBROUTINE SUMM (B,X,L,S) 
DIMENSION B(20) ,X(20) 
S=0. 

DO 1 I=1,L 

S=S+B (1) *X(TI) 

RETURN 

END 


SUBROUTINE SUMN (BB,X,L,S) 
DIMENSION X(20) 

S=0. 

DO 1 I=1,1L 

S=S+BB*X (TI) 

RETURN 

END 
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